
(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World intellectual Property Organization 

Inicmaiional Bureau 

(43) international Publication Date 
II October 2001 (1L10.2001) 




PCT 



(10) international Publication Number 

wo 01/76321 Al 



(51) Imernatioiiai Patent Classification^: H04R 25/00 (74) Agent: LARSEN & BIRKEHOLM A/S; Skandinavisk 

Patenlbureau, Baneg5rdspladsen I, P.O. Box 362, DK-1570 
(21) International Application Number: PCT/DKO 1/00226 K0benhavn V (DK). 



(22) international Filing Date: 4 April 2001 (04.04.2001) 



(25) Fifing Language: 

(26) Publication Language: 

(30) Priority Data: 
PA 2(HX) (XJ554 



English 



English 



4 April 2(MX) (04.04.2000) DK 



(71) Applicant (for all designated States except US): GN RE- 
SOUND A/S I DK/DK I; Murkicrvcj 2 A, DK-2630 Taastnip 
(DK). 

(72) Inventors; and 

(75) Inventors/Applicants //or i/5o/f/)';: NORDQV 1ST, Nils, 
Peter jSli/Slil; Pilvagcn 48, S-191 42 Sollenluna (SE). 
LEIJON, Anie |Sli/Slil; Strindbcrg.sgatan 36:5, S-1 15 31 
Stockholm (Sli). 



(81) Designated States (national): AE, AG, AL, AM. AT, AT 
(utility modelX AU, AZ, BA, BB, BG, BR, BY, BZ, CA, 
CH, CN, CO, CR, CU, CZ, CZ (utility model), DE, DE 
(utility model), DK, DK (utility model), DM, DZ, EE, EE 
(utility model), ES, FI, R (utility model), GB,GD, GE, GH, 
GM, FfR, HU, ID IL, IN, FS, JP, KE, KG, KP, KR, KZ. LC, 
LK, LR, LS, LT, LU, LV, MA, MD, MG. MK, MN, MW, 
MX, MZ, NO. NZ, PL, FT, RO, RU, SO, SE, SG, SL SK, 
SK (utility model), SL, TJ, TM, TR.'TT, TZ, UA, UG. US, 
U2. VN, YU, ZA, ZW. 

(84) Designated States (regional)-. ARIPO patent (GH, GM, 
KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZW), Eurasian 
patent (AM, AZ, BY, KG, KZ, MD, RU. TJ, TM), European 
patent (AT, BE, CH, CY, DE. DK, ES, R, FR. GB, GR, IE, 
IT. LU, MC, NL, PT, SE, TR), OAPI patent (BF, BJ. CF. 
CG. CI, CM, GA, GN. GW. ML, MR, NE, SN, TD, TG). 

Published: 

— with iraernational search report 

[Continued on next page] 



(54) Title: A IIHARINCi PROSTHESIS WITH AUTOMATIC CLASSIFICATION OF THE LISTENING ENVIRONMENT 



*»5 



< 





3 i Mm-*-" 
•3 




3ZZ 



T 



T 



1 




T 



\ 



(57) Abstract: The invention relates to a hearing prosthesis and method that provide automatic identitication or classification of a 
listening environment by applying one or several predetermined Hidden Markov Models to process acoustic signals obtained from 
O ihc listening environment. The hearing prosthesis may utilise determined classification results to conuo) parameter values of a prede- 
^ icrmined signal processing algorithm or to control a switching between different pre-set listening programs so as to optimally adapt 
>^ the signal processing of the hearing prosthesis to a given listening environment. 



<WO 0l76321AtJ_> 



wo 01/76321 Al 



lilU 


n: 


iiiiiii 


liMUiiil 


iiiiiii 



I 



For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations " appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



SNSDOCIO: <WO_0176321A1 J_> 



wo 01/76321 



PCT/DKOl/00226 



1 

A HEARING PROSTHESIS WITH AUTOMATIC CLASSIFICATION OF THE LISTENING 
ENVIRONMENT 

■ * * ■ * ' ■ 

FIELD OF THE INVENTION 

5 , - 

The present invention relates to a hearing prosthesis and method providing automatic 
identification or classification of a listening environment by applying one or several 
predetermined Hidden Markov Models to process acoustic signals obtained from the 
listening environment. The hearing prosthesis may utilise determined classification results 
10 to control parameter values of a predetermined signal processing algorithm or to control a 
switching between different pre-set listening programs so as to optimally adapt the signal 
processing of the hearing prosthesis to a given listening environment 

BACKGROUND OF THE INVENTION 

15 

Today's digitally controlled or Digital Signal Processing (DSP) hearing instruments are 
often provided with a number of pre-set listening programs. These pre-set listening 
programs are often included to accommodate a comfortiable and intelligible reproduced 
sound quality in differing listening environments. Audio signals obtained from these 

20 listening environments may have highly different characteristics, e.g. in terms of average 
and maximum sound pressure levels (SPLs) and/or frequency content. Therefore, for 
DSP based hearing prosthesis, each type of listening environment may require a 
particular setting of algorithm parameters of a signal processing algorithm of the hearing 
prosthesis to ensure that the user is provided with an optimum reproduced signal quality 

25 in all types of listening environments. Algorithm parameters that typically could be 

adjusted from one listening program to another include parameters related to broadband 
gain, comer frequencies or slopes of frequency-selective filter algorithms and parameters 
controlling e.g. knee-points and compression ratios of Automatic Gain Control (AGO) 
algorithms. Consequently, today's DSP based hearing aids are usually provided with a 

30 number of different pre-set listening programs, each tailored to a particular listening 
environment and/or particular user preferences. Characteristics of these pre-set listening 
programs are typically determined during an Initial fitting session in a dispenser's office 
and programmed into the aid by transmitting or activating corresponding algorithms and 
algorithm parameters to a non-volatile memory area of the hearing prosthesis. 

35 
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The hearing aid user Is subsequently left with the task of manually selecting, typically by 
actuating a push-button on the hearing aid or a program button on a remote control, 
between the pre-set listening programs in accordance with the current listening or sound 
environment Accordingly, when attending and leaving the multitude of sound 
5 environments in his/hers daily whereabouts, the hearing aid user may have to devote his 
attention to the delivered sound quality and continuously search for the best program 
setting in terms of comfortable sound quality and/or the best speech intelligibility. 

It would therefore be highly desirable to provide a hearing prosthesis such as a hearing 
10 aid or cochlea implant device that was capable of automatically classifying the user's 
current listening environment so as to belong to one of a number of typical everyday 
listening environments. Thereafter, classification results could be utilised in the hearing 
prosthesis to adjust the algorithm parameters of the current listening program, or to switch 
to another more suitable pre-set listening program, to maintain optimum sound quality 
1 5 and/or speech intelligibility for the individual hearing aid user. 

In the past there have been made attempts to adapt signal processing characteristics oif a 
hearing aid to the type of listening environment that the user is situated in. US 5,687,241 
discloses a multi-channel DSP based hearing instrument that utilises continuous 

20 determination or calculation of one or several percentile value of input signal amplitude 
distributions to discriminate between speech and noise input signals In the listening 
environment. Gain values in the frequency channels are subsequently altered In response 
to the detected levels of speech and noise. However, it is often desirable to discriminate 
between subtle characteristics of the input signal of the hearing aid not just between 

25 speech and noise. As an example, it may be desirable to switch between an omni- 
directional and a directional microphone listening program in dependence of, not just the 
level of background noise, but also on further signal characteristics of this background 
noise. In situations where the user of the hearing prosthesis communicates with another 
individual in the presence of the background noise, it would be beneficial If it was possible 

30 to identify and classify the type of background noise. Omni-directional operation could be 
selected in the event that the noise being traffic noise to allow the user to clearly hear 
approaching traffic independent of its direction of arrival. If, on the other hand, the 
background noise was classified as being babble-noise, the directional listening program 
could be selected to allow the user to obtain a reproduced signal with improved signal to 

35 noise ratio during a communication with the other individual. 
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Such a detailed characterisation of an input signal from a listening environment may be 
obtained by applying Hidden Markov Models for analysis and classification of the input 
signal. Hidden Markov. Models are capable of modelling stochastic input signals in terms 
5 of both short and long time temporal variations rather than just being restricted to 
modelling long term amplitude distribution statistics or average power. Hidden Mari<ov 
Models are well known in the field of speech recognition as a tool for modelling statistical 
properties of stochastic speech signals. The article "A Tutorial on Hidden Mari<ov Models 
and Selected Applications In Speech Recognition", published in Proceedings of the IEEE, 
10 VOL 77, No.2, February 1989 contains a comprehensive description of the application of 
Hidden Markov Models to problems in speech recognition. 

The present applicants have, however, for the first time applied Hidden Mari<ov Models to 
a task of classifying the listening environment of a hearing prosthesis to provide automatic 
1 5 adjustment of one or several parameter(s) of a predetennined signal processing algorithm 
executed in processing means of the hearing prosthesis In dependence of these 
classification results. 

SUMMARY OF THE INVENTION - 

20 

One object of the Invention is to provide a hearing prosthesis that automatically adjusts 
itself to a sunrounding listening environment by controlling one or several algorithm 
parameters of a predetermined signal processing algorithm to allow a user to 
automatically obtain intelligible and comfortable amplified sound in variety of different 
25 listening environments. 

It is another object of the invention provide a hearing prosthesis that continuously and 
automatically classifies an input signal as belonging to one of several everyday listening 
environments and indicates the cliassification results to processing means to allow the 
30 latter to perform the above-mentioned control of the algorithm parameters. 



35 
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DESCRIPTION OF THE INVENTION 

A first aspect of the invention relates to a hearing prosthesis comprising a microphone 
adapted to generate an input signal in response to receiving an acoustic signal from a 
5 listening environment, 

an output transducer for converting a processed output signal into an electrical or an 
acoustic output signal, 

10 processing means adapted to process the input signal in accordance with a 

predetermined signal processing algorithm and related algorithm parameters to generate 
the processed output signal, 

*■ ^ 

a memory area storing values of the related algorithm parameters for the predetermined 
15 processing algorithm, 

the processing means being further adapted to: 

segment the input signal into consecutive signal frames of time duration, Tj^^^^ , and 

20 generate respective feature vectors, 0(/), representing predetermined signal features of 
the consecutive signal frames, 

process the feature vectors with at least one Hidden Markov Model, 
^o«rc ^ \a""^ ,h{o{}\(xr^}, associated with a predetermined sound source to 
25 determine an element value(s) of a classification vector indicating a probability of the 
predetermined sound source being active in the listening environment, - 

control one or several values of the related algorithm parameters in dependence of 
element yalue(s) of the classificaOon vector. Thereby, characteristics of the predetennined 
30 signal processing algorithm are adapted to the cun-ent listening environment. The at least 
one Hidden Markov Model (HMM) comprising: 

A"^- A state transition probability matrix; 
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b(o{t)) = Probability function for the input observation 0{t) for each state of the at least 
one Hidden Markov Model; 

^^ource^ An initial state probability distribution vector. 

5 The hearing prosthesis may be a hearing instrument or aid such as a Behind The Ear 
(BTE), an In The Ear (ITE) or Compieteiy In the Canal (CfC) hearing aid. The input signal 
generated by the microphone may be an analogue signal or a digital signal in a multi-bit 
format or in single bit format generated by a microphone amplifier/buffer or an integrated 
analogue-to-digital converter, respectively. Preferably, the input signal tp the processing 

10 means is provided as a digital input signal. Therefore, in case the microphone signal is 
provided in analogue form, it is preferably converted into a corresponding digital input 
signal by a suitable analogue-to-digital converter (A/D converter) which may be included 
In an integrated circuit of the hearing prosthesis. The microphone signail may be subjected 
to various signal processing operations such as amplification and bandwidth limiting 

15 before being applied to the A/D converter and other operations afterwards such as 
decimation before the digital input signal is applied to the processing means. 

The output transducer that converts the processed output signal into an acoustic or 
electrical signal or signals may be a conventional hearing aid speaker often called a 
20 "receiver" or another sound pressure transducer producing a perceivable acoustic signal 
to the user of the hearing prosthesis. The output transducer may also comprise a number 

of electrodes that may be operatively connected to the user's auditory nerve or nerves. 

' • -■ ' . . ■ . ' . ' ■ - • " ' ■ 

In the present specification and claims the term "predetermined signal processing 
25 algorithm" designates any processing algorithm, executed by the processing means of the 
hearing prosthesis, that generates the processed output signal frorn the input signal. 

Accordingly, the "predetermined signal processing algorithm" may comprise a plurality of 

' • • . . ' ' ■ '. ■ * ' * " •• 

sub-algorithms or sub-routines that each performs a particular subtask in the 
predetemnined signal processing algorithm. As an example, the predetermined signal 
30 processing algorithm may comprise different signal processing sub-routines such as 
frequency selective filtering, single or multi-channel compression, adaptive feedback 
cancellation, speech detection and noise reduction, etc. 

Furthermore, several distinct selections of the above-mentioned signal processing sub- 
35 routines may be grouped together to fomi two, three or more different pre-set listening 
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programs which the user may be able to select between in accordance with his/hers 
preferences. 

The predetermined signal processing algorithm will have one or several related algorithm 
5 parameters. These algorithm parameters can usually be divided into a number of smaller 
parameters sets, where each such algorithm parameter set is related to a particular part 
of the predetermined signal processing algorithm or to particular sub-- outine as explained 
above. These parameter sets control certain characteristics of their respective subroutines 
such as comer-frequencies and slopes of filters, compression thresholds and ratios of 
1 0 compressor algorithms, adaptation rates and probe signal characteristics of adaptive 
feedback cancellation algorithms, etc. 

Values of the algorithm parameters are preferably intermediately stored in a volatile data 

. . • • • • «,■••."• 

memory area of the processing means such as a data RAM area during execution of the 
15 predetermined signal processing algorithm. Initial values of the algorithm parameters are 
stored in a non-volatile memory area such as an EEPROM/Flash memory area or battery 
backed-up RAM memory area to allow these algorithm parameters to be retained during 
power supply interruptions, usually caused by the user's removal or replacement of the 
hearing aid'*^ battery or manipulation of an ON/OFF switch. 

20 

The processing means may comprise one or several processors and its/their associated 
memory circuitry. The processor may be constituted by a fixed point or floating point 
Digital Signal Processor (DSP) with a single or dual MAC architecture that performs t>oth 
the calculations required in the predetermined signal processing algorithm as well a 
25 number of so-called household tasks such as monitoring and reading values of external 
interface signals and programming ports. Alternatively, the processing means may 
comprise a DSP that performs number crunching, i.e. multiplication, addition, division, etc. 
while a commercially available, or even proprietary, microprocessor kemel handles the 
household tasks which mostly inyolve logic operations and decision making. 

30 

The DSP may be a software programmable type executing the predetermined signal 
processing algorithm in accordance with instructions stored in an associated program 
RAM area. A data RAM area integrated with the processing means may store initial and 
intermediate values of the related algorithm parameters and other data variables during 
35 execution of the predetermined signal processing algorithm as well as various other 
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household variables. Such a software programmable DSP may be advantageous for 
some applications due to the possibility of rapidly implementing and testing modifications 
of the predetermined signal processing algorithm. Clearly, the same advantages apply to 
sub-routines that handle the household tasks. Altematively, the processing means may be 
5 constituted by a hard-wired DSP core so as to execute one or several fixed predetermined 
signal processing algorithm(s) in accordance with a fixed set of instructions from an 
associated logic controller. In this type of hard-wired processor architecture, the memory 
area storing values of the related algorithm parameters may be provided in the form of a 
register file or as a RAM area if the number of algorithm parameters justifies the latter 
10 solution. 

According to the invention, the processing misans are further adapted to segment the 
input signal into consecutive signal frames of duration Tj^^ and generate respective 

• • ■ •""-"•-•2 ■ t •*'-** » -•*'■ 

feature vectors, 0(^) . representing predetermined signal features of the consecutive 

15 signal frames. The feature vectors are subsequently processed with at least one Hidden 
Markov Model, = {4^^'^%6(o(/)),ar"i, associated with a predetermined sound 
source to determine element value(s) of a classification vector. This classification vector 
indicates a probability of the predetermined sound source being active in the currents 
listening environment By controlling one or several values of the algorithm parameters 

20 related to the predetermined signal processing algorithm in dependence of element 
value(s) of the classification vector, the processing of the Input signal is adapted to the 
listening environment in dependence of these element value(s). The consecutive signal 
frames may be non-<>veriapping or overiapping with a predetemnined amount of overiap, 
e.g. overlapping with between 10 % - 50 % to avoid sharp discontinuities at boundaries 

25 between neighbouring signal frames and/or counteract window effects of any applied 
window function, such as a Manning window, at the boundaries. While the above- 
mentioned frame segmentation of the input signal is required for the purpose of 
generating the feature vectors, 0{t) , and process these with the at least one Hidden 
Markov Model, the predetermined signal processing algorithm may process the input 

30 signal on a sample-by-sample basis or on a frame-by-frame basis with a frarhe time equal 
to or different from . 

■ _ r • 

The at least one Hidden Markov Model may comprise at least one discrete Hidden 
Markov Model, JT^ = {A""^,B'^,ar^}, wherein B"^ is an observation symbol 
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probability distribution matrix which serves as a discrete equivalent of the general 
function, b{o(t)), defining the probability function for the input observation 0{t) for each 
state of a Hidden Markov Model. In this discrete case, the processing means are 
preferably adapted to compare each of the respective feature vectors, 0{t) , with a feature 
5 vector set, often denoted a "codebook", to determine, for substantially each of the feature 
vectors, an associated symbol value so as to generate an obsen/ation sequence of 
symbol values associated with the consecutive signal frames. This process of determining 
symbol values from the feature vectors is commonly referred to as "vector quantization*'. 
Thereafter, the observation sequence of symbol values is processed with the at least one 
10 discrete Hidden Markov Model, Jlf'"^^ which Is associated with the predetermined sound 
source to determine the element value(s) of the classification vector. 

According to a preferred embodiment of the invention, the processing means are adapted 
to process the feature vectors with a plurality of Hidden Markov Models, or process the 

15 observation sequence of symbol values with a plurality of discrete Hidden Markov Models. 
Each of the discrete Hidden Markov Models or each of the Hidden Markov Models is 
preferably associated with a respective predetermined sound source to determine the 
element values of the classification vector. Each element value may direcUy represent a 
probability (I.e. a yalue between 0 and 1) of the associated predetermined sound source 

20 being active in the current listening environment. 

The duration of one of the signal frames. Tjy^, , is preferably selected to be within the 

range 1 - 100 milliseconds, such as about 5-10 milliseconds. Such time duration allow 
fhe applied Hidden Mari<ov Model(s) to operate on time scales of the input signal that are 
25 comparable to individual features, e.g. phonemes, of speech signals and on envelope 
modulations of a number of relevant acoustic noise sources. 

A predetermined sound source may be any natural or synthetic sound source such as a 
natural speech source, a telephone speech source, a traffic noise source, multi-tall<er or 

30 babble source, subway noise source, transient noise source or a wind noise source. A 
predetemiined sound source may also be constiftjted by a mixture of a natural speech 
and/or traffic noise and/or or babbie mixed together in a predetermined proportions to e.g. 
create a particular signal to noise ratip(snr) in that predetemriined sound source. For 
example, a predeteimined sound source may be speech and babble mixed in a proportion 

35 that creates a particular target snr such as 5 dB or 1 0 dB or more preferably 20 dS. The 
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Hidden Markov Model associated with such a mixed speech*babble sound source will 
then through the classification vector be able indicate how well a current input signal or 
signals fit this speech-babble sound source. The processing means can consequently 
select appropriate signal processing parameters based on both the interfering noise type 
5 and the actual signal to noise ratio. 

Temporal and spectral characteristics of each of these predetermined sound sources may 
have been obtained based on real-life recordings of one or several representative sound 
sources. The temporal and spectral characteristics for each type of predetermined sound 

1 0 source are preferably obtained by performing real-life recording of a number of such 
representative sound sources and concatenate these recordings in a single recording (or 
sound file). For speech sound sources, the present inventors have found that utilising 
about 10 different speakers, preferably 5 males and 5 females, will generally provide good 
classification results in the Hidden Markov Model associated with the speech source. The 

1 5 mixed sound source type is preferably provided by post-processing of one or several of 
the reaMffe recordings to obtain desired specific characteristics of the mixed sound source 
such as a predetermined signal to noise ratio. 

When the concatenated sound source recording has been formed, feature vectors, 
20 preferably identical to those feature vectors that are generated by the processor means in 
the hearing prosthesis, are extracted from the concatenated sound source recording to 
form a training observation sequence for the associated continuous or discrete HMM. The 
duration of the training sequence depends on the type of sound source, but it has been 
found that a duration of about 3 - 20 minutes, such as about 4-6 minutes is adequate for 
25 many types of sound sources including speech sound sources. Thereafter, for each 
predeternr)ined sound source, the corresponding HMM is trained with the generated 
training observation sequence, preferably, by the Baum- Welch iterative algorithm to 

« 

obtain values of, , the state transition probability matrix, values for b"^\ the 
observation symbol probability distribution matrix (for discrete HMM models) and values of 
30 aj~'** , the initial state probability distribution vector. If the HMM is ergodic, the values of 

the initial state probability distribution vector are determined from tiie state transition 
probability matrix. 

The feature vectors ttiat are generated from the consecutive signal frames may represent 
35 spectral properties of the signal frames, temporal properties of Uie signal frame or any 
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combination of these. The spectral properties may be expressed in the form of Discrete 
Fourier Transform coefficients. Linear Predictive Coding parameters, cepstrum 
parameters or corresponding differential cepstrum parameters. 

5 If a discrete HMM or HMMs are utilised, the codebook, may have been determined by an 
off-line training procedure which utilised real-life sound source recordings. The number of 
feature vectors that constitutes the codebook may vary depending on the particular 
application, but for hearing aid applications, it has been found that a codebook comprising 
between 8 and 256 different feature vectors, such as 32 - 64 different feature vectors 

1 0 usually will provide an adequate coverage of the complete feature space. The comparison 
between each of the feature vectors computed from the consecutive signal frames and 
the codebook provides a symbol value which miay be selected by choosing an Integer 
index belonging to that codebook entry nearest to the feature vector in question. Thus, the 
output of this vector quantization process may be a sequence of integer indexes 

1 5 representing the corresponding symbol values. 

To generate the codebook so as to closely resemble feature vectors that is generated in 
the hearing prosthesis during on-line processing of the Input signal, i.e. normal use, the 
real life sound recordings may have been made by passing the signal through an input 

20 signal path of a target hearing prosthesis. By adopting such a procedure, frequency 

response deviations as well as other linear and/or non-linear distortions generated by the 
Input signal path of the target hearing prosthesis can be compensated by introducing 
corresponding signal characteristics into the codebook. Thus, a close resemblance 
between the feature vector set and on-line generated feature vectors is secured to 

25 optimise recognition and classification results from the subsequent processing in the 
discrete Hidden Markov Model or Models. A similar advantageous effect may, naturally, 
be obtained by performing a pre-processing of the real-life sound recordings which is 
substantially similar to the processing of the input signal path of a target hearing 
prosthesis before extraction of the feature vector set or codebook is performed. The latter 

30 solution could be implemented by applying suitable analogue and/or digital filters or filter 
algorithms to the input signal tailored to simulate a priori known characteristics of the input 
signal path in question. 

While it has proven helpful to utilise so-called left-to-right Hidden Markov Models in the 
35 field of speech recognition where the known temporal characteristics of words and 
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Utterances are matched in a structure of the model, the present inventors have found it 
advantageous to use at least one ergodic Hidden Markov Model, and, preferably, to use 
ergodic Hidden Markov Models for all applied Hidden Markov Models. An ergodic Hidden 
Markov Model is a model in which it is possible to reach any internal state from any other 
5 internal state in the model. 

Jhe number of internal model states of any particular HMM of the plurality of HMMs may 
depend on the particular type of predetermined sound source modelled- A relatively 
simple nearly constant noise source may be adequately. modelled by a HMM with only a 

10 few internal states while more complex sound sources such as speech or mixed speech 
and complex noise sources may require additional internal states. Preferably, the at least 
one Hidden Markov Model or each of the plurality of Hidden Markov Models comprises 
between 2 and 1 0 states, such as between 3 and 8 states. According to a preferred 
embodiment of the invention, four discrete HMMs are used in a proprietary DSP In a 

1 5 hearing instrument, where each of the four HMMs has 4 intemal states. The four internal 
states are associated with four common predetermined sound sources: speech source, 
traffic noise source, multi-talker or babble source, and subway noise source, respectively. 
A codebook with 64 feature vectors, each consisting of 12 delta-cepstrum parameters, is 
utilised to provide vector quantisation of the feature vectors derived from the input signal 

20 of the hearing aid. However, the feature vector set may comprise between 8 and 256 
different feature vectors, such as 32 - 64 different feature vectors without taking up 
excessive amount of memory in the hearing aid DSP. 

The processing means may be adapted to process the Input signal In accordance with at 
25 least two different predetemiined signal processing algorithms, each being associated 
with a set of algorithm parameters, where the processing means are further adapted to 
control a transition between the at least two predetermined signal processing algorithms 
in dependence oi the element value(s) of the classification vector. This embodiment of the 
invention is particularly useful where the hearing prosthesis is equipped with two closely 
30 spaced microphones, such as a pair of omni-directional microphones, generating a pair of 
input signals which can be utilised to provide a directional, signal mode by well-known 
delay-subtract techniques and a non-directional signal mode, e.g. by processing only one 
of the Input signals. The processing means may control a transition between the 
directional and the omni-dlrectional mode in a smooth manner through a range of 
35 intermediate values of the algorithm parameters so that the directionality of the processed 
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output signal gradually increases/decreases. The user will thus not experience abrupt 
changes in the reproduced sound but rather e.g. a smooth improvement in signal to noise 
ratio. 

5 To control such transitions between two predetermined signal processing algorithms, the 
processing means may further comprise a decision controller adapted to monitor the 
elements of the classification vector and control transitions between the plurality of I iidden 
Markov Models in accordance with a predetermined set of rules. The decision controller 
may advantageously operate as an intermediate layer between the classification vector 

1 0 provided by the HMMs and the one or plurality of related algorithm parameters. By 
monitoring element values of the classification vector and controlling the value(s) of the 
related algorithm parameter(s) in accordance with rules about maximum and minimum 
switching times between HMMs and, optionally, interpolation characteristics between the 
algorithm parameters, the Inherent time scales that the HMMs operates on can be 

15 smoothed. If for example, a number of discrete HMMs operates on consecutive symbol 
values that each represent a time frame of aboiit 6 ms. It may be advantageous to 
lowpass filter or smootfi rapid transitions between a speech HMM and babble horse HMM 
that are caused by pauses between words in conversational speech in a "cocktail party" 
type listening environment Instead of performing an instantaneous switch between the 

20 two predetermined signal processing algorithms for every model transition, suitable time 
constants and hysteresis could be provided in the decision controller. 

According to a preferred embodiment of the invention, the decision controller comprises a 
second set of HMMs operating on a substantially longer time scale of the Input signal than 

25 the HMM(s) in a first layer. Thereby, the processing means are adapted to process the 
observation sequence of symbol values or the feature vectors with a first set of Hidden 
Markov Models operating at a first time scale and associated with a first set of 
predetermined sound sources to determine element values of a first classification vector. 
Subsequentiy, the first classification vector is processed witfi the second set of Hidden 

30 Markov Models operating at a second time scale and associated with a second set of 
predetermined sound sources to determine element values of a second classification 
vector. 

The first time scale is preferably selected within the range 10 - 100 ms to allow the first 
35 set of HMMs to operate on individual signal features of common speech and noise signals 
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and the second time scale is preferably selected within the range 1-60 seconds such as 
about 10 or 20 seconds to allow the second set of HMMs to operate on changes between 
different listening environments. Environmental changes usually occur when the user of 
the hearing prosthesis moves between differing listening environments, e.g. a subway 
5 station and the interior of a train or a domestic environment, or between an interior of a 
car and standing near a street with bypassing traffic etc. 

A second aspect of the invention relates to a method of generating automatic 
classification of input signals in a hearing prosthesis, the method comprising the steps of:' 

10 

receiving an acoustic signal from a listening environment by a microphone of the hearing 
prosthesis to generate an input signal, 

processing the input signal in accordance with a predetermined signal processing 
15 algorithm and a plurality of related algorithm parameters stored in a memory area to 
generate a processed output signal, 

segmenting the input signal into consecutive signal frames of time duration 

20 generating respective feature vectors, 0(t). representing predetermined signal features 
of tlie consecutive signal frames, 

processing tlie feature vectors witfi at least one Hidden Markov Model, 

JT"' = {A""^Xo{t)),ar'"}. associated with a predetemiined sound source to 

' * ' ' 

25 determine element value(s) of a classification vector indicating a probability of the 
predetermined sound source being active in the listening environment. 

controlling one or several values of the related algorithm parameters in dependence of 
element value(s) of the classification vector to control characteristics of the processed 
30 output signal, 

converting the processed output signal into an electrical or an acoustic output signal or 
signals by one or several output transducers. 



.S0OCI0:<WO 0176321A1 I > 



wo 01/76321 PCT/DKOl/00226 

14 

thereby adapting characteristics of the predetermined signal processing algorithm to the 
current listening environment; wherein 



^~""''= A state transition probability matrix; 
5 b{o{t))= Probability function for the observation o(t) for each state of the at least one 
Hidden Markov Model; 

^source _ jj^jjjgj gjgjg probability distribution vector. 



The feature vectors may be subjected to a vector quantisation process by comparing each 
10 of the respective feature vectors, 0{t) , with a feature vector set or codebook. and 
determine, for substantially each feature vector, an associated symbol value so as to 
generate ari observation sequence of symbol values associated with the consecutive 
signal frames. By processing the observation sequence of symbol values with at least one 
discrete Hidden Markov Model. ^^-^ = ^sour^ s"^c^^«,urcej^ associated with the 

1 5 predetermined sound source, the element value or values of the classification vector may 
be determined; wherein 



B ^ = An observation symbol probability distribution matrix. 

20 For hearing aid applications, it has been found useful to utilise at least a few HMMs in 
order to recognise at least a few corresponding and common listening environments so 
that the method may comprise processing the feature vectors with a plurality of Hidden 
Markov Models, or process the observation sequence of symbol values vectors with a 
plurality of discrete Hidden Maritov Models. According to this embodiment of the 

25 invention, each of the discrete Hidden Markov Models or the Hidden Mari<ov Models is 
associated with a respective predetennined sound source to determine the element 
values of the classification vector, each element value indicating a probability of the 
respective predetenmlned sound source being active in the cun-ent listening environment. 

30 According to a third aspect of the invention, a set of HMM6 are utilised to recognise 
respective isolated words to provide Uie hearing prosthises with a capability of identifying 
a small set of voice commands which the user may utilise to control one or several 
functions of ttie hearing aid by his/hers voice. For this word recognition feature, discrete 
left-right HMMs are preferably utilised rather than the ergodic HMMs ttiat it was preferred 
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to applly to the task of providing automatic listening enviroment classification! Since a left- 
right HMM is a special case of an ergodic HMM, the HMM structure that Is used for the 
above-described ergodic HMMs may be at least partly re-used for the left-right HMMs. 
This has the advantage that DSP memory and other hardware resources may be shared 
5 in a hearing prosthesis that provides both automatic listening enviroment classification 
and word recognition. Preferably, a number of isolated word HMMs, such as 2 - 8 HMMs, 
is stored in the hearing prosthesis to allow the processing means to recognise a 
corresponding number of distinct words. The output from each of the isolated word HMMs 
is a probability for a modelled word being spoken. Each of the isolated word HMMs must 

10 be trained on the particular word or command it must recognise during on-line processing 
. of the input signal. The training could be performed by applying a concatenated sound 
source recording including the particular word or command spoken by a number of 
different individuals to the associated HMM. Alternatively, the training of the Isolated word 
HMMs could be performed during a fitting session where the words or commands 

15 modelled were spoken by the user himself to provide a personalised recognition function 
in the user's hearing prosthesis. 

BRIEF DESCRIPTION OF THE DRAWINGS 

20 A prefenred embodiment of a software programmable DSP based hearing aid according to 
the invention is described in the following with reference to the drawings, wherein 

Fig. 1 is a simplified block diagram of three-chip DSP based hearing aid utilising Hidden 
Markov Models for input signal classification according to the Invention, 

25 

Fig. 2 is a signal flow diagram of a predetermined signal processing algorithm executed 
on the three-chip DSP based hearing aid shown in Fig. 1 , 

Fig. 3 is signal flow diagram illustrating a listening environment classification process, 
30 ' 

Fig. 4 is a state diagram for the environment Hidden Mari<ov Model shown in Fig. 3 as 
block 550. 



35 
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DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT 

In the following, a specific embodiment of a three chip-set DSP based hearing aid 
according to the invention Is described and discussed in greater detail. The present 
5 description discusses in detail only an operation of the signal processing part of a DSP- 
core or kernel with associated memory circuits. An overall circuit topology that may form 
basis of the DSP hearing aid is well known to the skilled person and is, accordingly, 
reviewed in very general terms only. 

10 In the simplified block diagram of Fig. 1, a conventional hearing aid microphone 105 
receives an acoustic signal from a surrounding listening environment. The microphone 
1 05 provides an analogue input signal on terminal MIC1IN of a proprietary AID Integrated 
circuit 102. The analogue input signal is amplified in a microphone preamplifier 106 and 
applied to an input of a first A/D converter of a dual A/D converter circuit 110 comprising 

1 5 two synchronously operating converters of the sigma-delta type. A serial digital data 

stream or signal is generated in a serial interface circuit 111 and transmitted from terminal 
A/DDAT of the proprietary A/D integrated circuit 102 to a proprietary Digital Signal 
Processor circuit 2 (DSP circuit). The DSP circuit 2 coriiprises an A/D decimator 13 which 
is adapted to receive the serial digiteJ data stream and convert it into corresponding 16 bit 

20 audio samples at a lower sampling rate for further processing in a DSP core 5. The DSP 
core 5 has an associated program Random Read Memory (program RAM) 6, data RAM 7 
and Read Only Memory (ROM) 8. The signal processing of the DSP core 5, which is 
described below with reference to the signal flow diagram in Fig. 2 is controlled by 
program instructions read from the program RAM 6. 

25 

A serial bi-directional 2-wire programming interi'ace 300 allows a host programming 
system (not shown) to communicate with the DSP circuit 2, over a serial interface circuit 
12, and a commercially available EEPROM 202 to perfonn up/downloading of signal 
processing algorithms and/or associated algorithm parameter values. 

30 

A digital output signal generated by the DSP-core 5 from the analogue input signal is 
transmitted to a Pulse Width Modulator circuit 14 that converts received output samples to 
a pulse width modulated (PWM) and noise-shaped processed output signal. The 
processed output signal is applied to two tenminals of hearing aid receiver 10 which, by its 
35 inherent low-pass filter characteristic converts the processed output signal to an 
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corresponding acoustic audio signal. An internal clock generator and amplifier 20 receives 
a master dock signal from an LC oscillator tank circuit formed by L1 and C5 that in co- 
operation with an intemal master clock circuit 1 12 of the A/D circuit 102 forms a master 
clock for both the DSP circuit and the A/D circuit 102. The DSP-core 5 may be directly 
5 clocked by the master clock signal or from a divided clock signal. The DSP-core 5 Is 
preferably clocked with a frequency of about 2-4 MHz. 

Fig. 2 illustrates a relatively simple application of discrete Hidden Markov Models to 
control algorithm parameter values of a predetemiined signal processing algorithm of the 

10 DSP based hearing aid shown In Fig. 1. The discrete Hidden Markov Models are used in 
the hearing aid or instrument to provide automatic classification of three different listening 
environments, speech in traffic noise, speech in babble noise, and clean speech as 
illustrated in Fig. 4. In the present embodiment of the invention, each listening 
environment is connected with a particular pre-set frequency response implemented by 

1 5 FIR-filter block 450 that receives its filter parameter values from a filter choice controller 
430. Operations of both the FIR-filter block 450 and the filter choice controller 430 are 
preferably performed by respective sub-routines executed on the DSP core 5. Switching 
between different FIR-filter parameter values is automatically performed when the user of 
the hearing aid is moving between different listening environments which is detected by 

20 an listening environmental classification algorithm 420, comprising two sets of discrete 
HMMs operating at differing time scales as will be explained with reference to Figs. 3 and 
4. Another possibility is to let the listening environmental classifier 420 supplement an 
additional multi-channel AGC algorithm or system, which could be inserted between the 
input (IN) and the FIR-filter block 450, calculating, or detenmining by table lookup, gain 

25 values for consecutive signal frames of the input signal. 

The user may have a favorite frequency response/gain for each of the listening 
environments that can be recognized/classified by its corresponding discrete Hidden 
Mari<ov Model. These favorite frequency responses/gains may be found by applying a 
30 number of standard prescription methods, such as NAL, POGO etc, combined with 
individual interactive fine-tuning methods. 

In Fig. 2. a raw Input signal at node IN, provided by the output of the A/D decimator 1 3 in 
Fig. 1, is segmented to form consecutive signal frames, each with a duration of 6 ms. The . 
input signal is preferably sampled at 16 kHz at tills node so that each frame consists of 96 
35 audio signal samples. The signal processing is performed along of two difl^erent paths. In 
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a dassificatlon path through signal blocks 410, 420, 440 and 430, and a predetermined 
signal processing path through block 450. Pre-computed impulse responses of the 
respective. FIR filters are stored in the data RAM during program execution. The choice of 
parameter values or coefficients for the FIR filter block 450 is performed by the Filter 
5 Choice Block 430 based on the element values of the classification vector, and, optionally, 
on data from the Spectrum Estimation Block 440. 

Fig. 3 shows a signal flow diagram of a prefen-ed implementation of the classification 
block 420 of Fig. 2. A vector quantizer (VQ) block 510 precedes the dual layer HMM 

10 architecture, where blocks 520, 521, 522 is a first HMM layer and block 550 is a second 
HMM layer. The system therefore consists of four stages: a feature extraction layer 500, a 
sound feature classification layer 510, the first HMM layer in the form of a sound source 
classification layer 520-522 and a second HMM layer in the form of a listening 
environment classification layer 550. The sound source classification layer uses three or 

15 five Hidden Markov Models and a single HMM is used in the listening environment 
classification layer 550. 

The structure of the classification block 420 makes it possible to have different switching 
times between different listening environrnents, e.g. slow switching between traffic and 
20 babble and fast switching between traffic and speech. 

The output signal OUT1 of classification block 420 is a classification vector, in which each 
element contains the probability that a particular sound source of the three pre- 
detemnined sound sources 520, 521, 522 modelled by their respective discrete HMMs is 
25 active. The output signal OUT2 is another classification vector, in which each element 
contains the probability that a particular listening environment is active. 

The processing of the input signal in the above-mentioned classification path is described 
in the following with reference to the implementation in Fig. 3; 

30 

The input at time t is a block x{t) , of size fi, with input signal samples. 

x{t) is multiplied with a window, tv„, and the Discrete Fourier Transfomi, DFT, is 
calculated. 
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^*(') = ^Z^'»^-(^> ' * = 0..5/2-l 

A feature vector is extracted or computed for every new frame. It Is presently preferred to 
use 12 cepstrum parameters for each feature vector 

5 The output at time f is a feature column vector, f with continuous valued elements. 
The corresponding differential cepstmm parameter vector (often called delta-cepstrum), is 



calculated as Af(0 = 2]^i^(^"0» where A, is detennined such that Af(r) approximates 



10 the first differential of f(/) with respect to the time i, A preferred length of the filter 
defined by coefficients \ is K=8. 

The delta-cepstrum coefficients are sent to the vector quantizer in the classification block 
420- Other features; e.g. time domain features or other frequency-based features, may be 
15 added. 

* 

The classification block 420 comprises three layers operating at different time scales: (1) 
a Short-term Layer (Sound Feature Classification) 510, operating instantly on each signal 
frame, (2) a Medium-term Layer (Sound Source Classification) 501-522, operating in the 
20 time-scale of envelope modulations within predetermined sound sources modelled by the 
four HMMs, and (3) a Long-term Layer (Listening Environment Classification) 550, 
operating in a slower time-scale corresponding to shifts between different sound sources 
in a given listening environment or the shift between different listening environments. This 
is further illustrated in Fig. 4. 

25 

* • . * , 

The predetermined sound sources modelled by the present embodiment of the invention 
are trafTic noise source, babble noise source, and a clean speech source but could also 
comprise mixed sound sources that each may contain a predetermined proportion of e.g. 
speech and babble or speech and traffic noise as illustrated in Fig. 4. The final output of 
30 the classifier is a listening environinent probability vector, OUT1 , continuously indicating a 
current probability estimate for each listening environment, and a sound source probability 
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vector, OUT2, indicating the estimated probability for each sound source. A listening 
environment may consist of one of the predetermined sound sources 520-522 or a 
combination of tvyo or more of the predetermined sound sources as illustrated in more 
detail in the description of Fig. 4. 

5 

The input to the vector quantizer block 510 is a feature vector with continuously valued 
elements. The vector quantizer has M, e.g 32, codewords in the codebook [c* ... ] 
approximating the complete feature space. The feature vector is quantized to closest 
codeword in the codebook and the index o{t) , an integer index between 1 and M, to the 
1 0 closest codeword is generated as output. 

C?(0 = arg min||Af (0- c'l' 

The VQ is trained off-line with the Generalized Lloyd algorithm (Linde, 1 980). Training 
material consisted of real-life recordings of sounds-source samples. These recordings 
have been made through the input signal path, shown on Fig. 1, of the DSP based 
1 5 hearing instrument. 

» » • * 

r ' " , m ' * ' 

Each of the three sound sources Is modelled by a respective discrete HUM. Each HMM 
consists of a state transition probability matrix, 'A'"^"^ , an observation symbol probability 
distribution matrix, and an initial state probability distribution column vector, 

20 . A compact notation for a HMM is, = 5"*""^% a^''"'''^}. Each sound 

source model has N=4 internal states and observes the stream of VQ symbol values or 
centroid indices [o(l) • • • O, e [l, m] . The current state at time t is modelled 

as a stochastic variable Q'"^^ (/) e (l,..., iv) . 

25 The purpose of the medium-term layer is to estimate how well each source model can 
explain the cun-ent input observation 0{t) , The output is a column vector u(/) with 
elements indicating the conditional probabilities 
jEJ"'^^(0=;?rt?Z>(o(/)|o(/- for each source. 

30 The standard forward algorithm (Rabiner, 1989) is used to update recursively the state 
probability column vector p"^{t). The elements p'"^'{f) of this vector indicate the 



0176321A1J_> 



wo 01/76321 PCT/DKOl/00226 

21 

conditional probability that the sound source is in state /, 

pr'^{t) = prob(Q''^it) = Uo{t]oit-ll..,o{l\^ 

The recursive update equations are: . 
5 p"'^'(0=((A"^'7p*~""(r-l))bb"'''^'(o(0^ 

r'-*(0= >ro6(o(/)|o(^ -l),...,o(l)^^~-)= 1;^— (0 

U 

^source ^source fA/ ^source {^\ 

wherein operator o defines element-wise multiplication. 

10 

Fig. 4 shows in more detail a slightly modified version of dual layer HMIVI structure 

» . • • * * " ■ > 

Illustrated in Fig. 3 so that the first layer of HMMs 520t522 comprises two additional 
HMMs, a fourth HMM modelling a predetermined sound source of "speech in traffic noise"" 
and fifth HMM modelling a predetermined sound source "speech in cafeteria babble". 

15 

Signal OUT1 of the final HMM layer 550 estimates current probabilities for each of the 
modelled listening environment by observing the stream of sound source probability 
vectors from the previous layer of HMMs. The listening environment is represented by a 
discrete stochastic variable E{t)^ {l...3}, with outcomes coded as 1 for '^speech in traffic 

20 noise", 2 for "speech in cafeteria babble", 3 for "c/ean speech". Thus, the output 
probability vector or classification vector has three elements, one for each of these 
environments, the final HMM layer 550 contains five states representing Traffic noise, 
Speech (in traffic. "Speech/T'), Babble. Speech (in babble, "Speech/B"). and Clean 
Speech ("Speech/C"). Transitions between listening environments, indicated by dashed 

25 arrows, have low probability, and transitions between states within one listening 
environment, shown by solid anrows, have relatively high probabilities^ 

The final HMM layer 550 consists of a Hidden Markov Model with five states and transition 
probability matrix (Fig. 4). The cun^ent state In the environment hidden Markov 
30 model is modelled as a discrete stochastic variable S{t) e {1...5} , with outcomes coded as 
1 for ^'traffic", 2 for speech (in traffic noise, "speec/i/T), 3 for ''babbie", 4 for speech (in 
babble, "speech/B"), and 5 for clean speech ^'speecti/C. 
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The speech in traffic noise listening environment, 1 , has two states S{t)= 1 and 

S(t)=2 . The speech in cafeteria babble listening situation. 2 , has two states 
iS(/)= 3 and S{t) = 4 . The clean speech listening environment, E{t) = 3 , has only one 
5 state, S{t)=^5 . The transition probabilities between listening environments are relatively 
low an<i the transition probabilities between states within a listening environment are high. 

The environment Hidden Markov Model 560 observes the stream of vectors 
[u(l) ... u(^)], where 

10 u(0 = k^^^(0 .^''''^it) f^'^^it) (2>^^^(0rcontaining the estimated 

observation probabilities for each state. The probability for being in a state given the 
current and all previous observations and given the environment Hidden Markov Model, 

pr' = prob{s{t) = zju(^X..., u(l), A^*'), is calculated with the fonA/ard algorithm (Rabiner, 
1989), 

15 p^':^(/)= ((a"*")^ p'""(/ - l))o u(/), with elemem 

/?,f"' = prob[s{t) = /, u(/)|u(/ ~ l),..., A^"), and finally, with nomriallzation. 

p-'(0=p""'(O/Z/>r(O. 

The probability for each listening environment, p^(/}. given all previous observations and 
given the environment hidden Markov model, can now be calculated as 



20 p^(0 = 



0 0 11 0 
0 0 0 0 1 



env 



As previously mentioned, the spectrum estimation block 440 of Fig. 2 is optional but may 
be utilized to estimate an average frequency spectrum which adapts slowly to the current 
listening environment. Another possibility is to estimate two or more slowly adapting 
25 spectra for different sound sources in a given listening environment, e.g. one speech 
spectrum and one noise spectrum. 
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The source probabilities, ^"^{t). the environment probabilities p^(/), and the current 
log power spectrum, X{f), are used to estimate the current signal and noise log power 
spectra. Two low-pass filters are used In the estimation, one filter for the signal spectrum 
and one filter for the noise spectrum. The signal spectaim is updated if pf(t)> p^if) and 
5 f'"'"'{t)>r''''{t) or if p^(t)>pf(t) and <zJ'''"^(r)>,zJ*-"''(0. The noise spectrum Is 
updated if ;>fO)>pf(/) andy'^^(/)> ^''-"(r) or if p!{t)>pf{t) and 
^'^'"'''{t)>^"""'''{t). ■■■■■■■■■■ ■■ 

NOTATION: 

10 M Number of centroids in Vector Quantizer 
N Number of States In HMM 

jfsource ^ ^source ^ ^source ^ ^ourcej compact notation fof 3 cHscrete HMM, describing a source, 
with N states and M observation symbols 

■ 

B Blocksize 
15 0 = [p.„ C>J Observation sequence 

Of s Discrete observation at time t 

f{t) Feature vector 
w Window of size B 

x{t) One block of size 8, at time t, of raw input samples 
20 X{t) The corresponding discrete complex spectrum, of size B, at time t 
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CLAIMS 

1 . A hearing prosthesis comprising: 

6 a microphone adiapted to generate an input signal in response to receiving an acoustic 
signal from a listening environment, 

an output transducer for converting a processed output signal into an electrical or an 
acoustic output signal, 

10 

processing means adapted to process the input signal in accordance with a 
predetermined signal processing algorithm and related algorithm parameters to generate 
the processed output signal, 

1 5 a memory area storing values of the related algorithm parameters for the predetermined 
signal processing algorithm, 

the processing means being further adapted to: 
20 segment the input signal into consecutive signal frames of time duration, T. and 

' frame 

generate respective feature vectors, 0(/), representing predetennined signal features of 
the consecutive signal frames, 

process the feature vectors with at least one Hidden Markov Model, 
25 = associated with a predetermined sound source to 

determine an element value or values of a classification vector indicating a probability of 
the predetermined sound source being active in the listening environment, 

control one or several values of the related algorithm parameters in dependence of 
30 element value(s) of the classification vector, 

thereby adapting characteristics of the predetermined signal processing algorithm to the 
current listening environment; wherein: 
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25 

^so«rce _ ^ gjg^g transition probability matrix; 

^(^(0) = Probability function for an input observation for each state of tiie at least 
one Hidden Markov Model; 

= An initial state probability distribution vector. 

5 

2. A hearing prosthesis according to claim 1, wherein the processing means are adapted 
to: 

compare each of the feature vectprs, 0{f) , with a feature vector set to determine, for 
1 0 substantially each feature vector, an associated symbol value so as to generate an 
observation sequence of symbol values associated with the consecutive signal frames, 

process the observation sequence of symbol values with at least one discrete Hidden 
Markov Model. >t^^"^- = associated with the predetermined sound 

1 5 source to determine the element value(s) of the classification vector; wherein: 
j^scurce ^ obsefvatlon symbol probability distribution matrix. 

3. A hearing prosthesis according to claim 1 or 2, wherein the processing means are 

adapted to process the feature vectors with a plurality of Hidden Markov Models, or 

20 process the observation sequence of symbol values with a plurality of discrete Hidden 
Markov Models, 

each of the discrete Hidden Markov Models or the Hidden Markov Models being 
associated with respective predetermined sound sources to detennine the element values 
25 of the classification vector, each element value indicating a probability of a respective 
predetermined sound source being active in the listening environment 

4. A hearing prosthesis according to any of the preceding claims, wherein the value of 
^frame betwoen 1 to 100 milliseconds, such as about 5-10 milliseconds. 

30 

5. A hearing prosthesis according to claim 3, wherein at least some of the plurality of 
Hidden Markov Models are adapted to model respective predetermined sound sources 
selected from the group consisting of: {speech, telephone speech, traffic noise, multi- 
talker or babble noise, subway noise, transient noise, wind noise}. 
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6. A hearing prosthesis according to any of the preceding claims, wherein each of the 
respective feature vectors comprises a plurality of frequency-domain parameters 
representing the predetermined signal features of the consecutive signal frames. 

5 

7. A hearing prosthesis according to any of the preceding claims, wherein each of the 
respective feature vectors comprises a plurality of time-domain parameters representing 
the predetermined signal features of the consecutive signal frames. 

10 8. A hearing prosthesis according to any of the preceding claims, wherein each of the 
respective feature vectors comprises a plurality of cepstrum parameters or differential 
cepstrum parameters representing the predetermined signal features of the consecutive 
signal frames. 

15 9, A hearing prosthesis according to any of claims 2- 8, wherein the feature vector set has 
been determined in an off-line training procedure which utilised real-life sound source 
recordings and stored in non-volatile memory locations of the hearing instrument. 

10. A hearing prosthesis according to any of claim 9, wherein the reaUife sound 

20 recordings have been applied to an input signal path of a target hearing prosthesis or by 
performing an equivalent signal processing of the input signal to simulate characteristics 
of the input signal path. 

1 1 . A hearing prosthesis according to any of claims 2 - 10, wherein the each of the feature 
25 vectors is associated with respective integer symbol values during a vector quantisation 

process. 

12. A hearing prosthesis according to any of the preceding claims, wherein the Hidden 
Markov Model or Models comprise at least one ergodic Hidden Markov Model. 

30 

13. A hearing prosthesis according to any of the preceding claims, wherein the at least 
one predetermined Hidden Mari<ov Model or each of the plurality of predetermined Hidden 
Markov Models comprises between 2 and 10 states. 
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14. A hearing prosthesis according to any of the preceding claims, wherein the at least 
one predetermined Hidden Markov Model or each of the plurality of predetermined Hidden 
Markov Models comprises between 8 and 256 discrete symbols. 

5 15. A hearing prosthesis according to any of the preceding claims, wherein the processing 
means are adapted to process the input signal in accordance at least two different 
predetermined signal processing algorithms, each being associated with a respective set 
of algorithm parameters, 

10 the processing means being further adapted to control a switching between the at least 
two predetermined signal processing algorithms in dependence of the determined 
element yalue{s) of the classification vector. 

16. A hearing prosthesis according to claim 15, wherein the processing means are 

15 adapted to process two input signals from a pair of omni-directional microphones by a first 
predetermined signal processing algorithm with a first set of algorithm parameters, and 

adapted to process the two input signals by a second predetermined signal processing 
algorithm with a second set of algorithm parameters, 

20 

17. A hearing prosthesis according to any of claims 3-16, wherein the processing means 
further comprises a decision controller adapted to monitor the elements of the 
classification vector and control transitions betweeri the plurality of Hidden Markov Models 

In accordance with a predetenmined set of rules. 

■. ■ ■ ' « . ■ • ■ 

25 

18. A hearing prosthesis according to any of claims 3-16, wherein the processing means 
are adapted to process the observation sequence of symbol values or the feature vectors 
with a first set of Hidden Markov Models operating at a first time scale and associated with 
a first set of predetermined sound sources to determine element values of a first 

30 classification vector, and 

adapted to process the first classification vector with a second set of Hidden Markov 
Models operating at a second time scale and associated with a second set of 
predetermined sound sources to determine element values of a second classification 
35 vector. 
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1 9. A hearing prosthesis according to claim 18. wherein the first time scale is selected 
within the range 10 - 100 ms and the second time scale is selected within the range 1 - 
60 seconds. 

5 

20. A hearing prosthesis according to any of the preceding claims, wherein the processing 
means comprises a software programmable processor. 

21 . A method of generating automatic classification of input signals in a hearing 
10 prosthesis, the method comprising the steps of: 

receiving an acoustic signal from a listening environment by a microphone of the hearing 
prosthesis to generate an input signal, 

15 processing the input signal in accordance with a predetermined signal processing 
algorithm and related algorithm parameters to generate a processed output signal, 

segmenting the input signal into consecutive signal frames of time duration, . 

20 generating respective feature vectors, 0{t) , representing predetermined signal features 
of the consecutive signal frames, 

processing the feature vectors with at least one Hidden Markov Model, 
jpource ^ {4'°"«%fe(c>(/)),ar'"}. associated with a predetemiined sound source to 
25 determine element value(s) of a classification vector indicating a probability of the 
predetermined sound source being active in the listening environment, 

controlling one or several values of the related algorithm parameters in dependence of 
element value(s) of the classification vector to control characteristics of the processed 
30 output signal, 

converting the processed output signal Into an electrical or an acoustic output signal or 
signals by one or several output transducers. 
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thereby adapting characteristics of the main processing algorithm to the listening 
environment, 

^source ^ ^j^j^ transition probability matrix; 
5 6(o(/)) = Probability function for the observation 0{t) for each state of the at least one 
Hidden Markov Model; 

^source « \x\\Wb\ state probability distribution vector, 

22. A method according to claim 21 , comprising the steps of: 

10 

comparing each of the respective feature vectors, 0{t) , with a feature vector set , 

determining, for substantially each feature vector, an associated symbol value so as to 
generate an observation sequence of symbol values associated with the consecutive 
15 signal frames, 

processing the observation sequence of symbol values with at least one discrete Hidden 

4. 

Markov Model. = \A"^,B"^,c4'^), associated with the predetermined sound 

source to determine the element value or values of the classification vector, 
20 ^ 

^source « ^ observatlon symbol probability distribution matrix. 

23. A method according to claim 21 or 22, wherein the processor fs adapted to process 
the feature vectors with a plurality of Hidden Markov Models, or process the observation 

25 sequence of symbol values vectors with a plurality of discrete Hidden Markov Models, 

each of the discrete Hidden Markov Models or the Hidden Markov Models being 
associated with respective predetermined sound sources to determine the element values 
of the classification vector, each element value indicating a probability of the respective 
30 predetermined sound source being active in the listening ienvironment. 
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